We examine the problem of learning a single occurrence regular expression with interleaving (SOIRE) from a set of text strings with noise. SOIRE has unrestricted support for interleaving and covers most of the regular expressions in practice. Learning SOIREs is challenging because it needs heavy computation and text strings usually contains noise in practice. Most of the previous work only learns restricted SOIREs and is not robust on noisy data. To tackle these issues, we proposea noise-tolerant differentiable learning approach SOIREDL for SOIRE. We design a neural network to simulate SOIRE matching of given text strings and theoretically prove that a class of the set of parameters learnt by the neural network, called faithful encoding, is one-to-one corresponding to SOIRE for a bounded size. Based on this correspondence, we interpret the target SOIRE from the set of parameters of the neural network by exploring nearest faithful encodings. Experimental results show that SOIREDL outperforms the state-of-the-art approaches especially on noisy data.
translated by 谷歌翻译